Skip to content

Conversation

@casparvl
Copy link
Collaborator

@casparvl casparvl commented Dec 24, 2025

Many things still need to be done... The software-layer-scripts PR should make sure to

  • Run CUDA 12.6-based builds with --module-only if they target CC100 or above
  • Change the requested CUDA compute capabilities for cuDNN where-ever that is needed. E.g. cuDNN 9.5.0 comes, I think, only with 9.0 device code, not 9.0a. Thus, we should change the requested CC to 9.0 for that particular software name & version. For cuDNN 9.10.1.4, I think 9.0a is supported, but 10.0f is not and it should be changed to 10.0. I'd prefer to make those changes in hooks to avoid having to open multiple different software-layer PR, each with custom options for the build. Added advantage is that by doing it in the hooks, it also fixes things for EESSI-extend-based installations.

Edit 08-01:

  • cuDNN-9.5.0.50 indeed contains device code for 7.0, 8.0 and 9.0, but not for 9.0a, which causes the sanity check to fail. So we should make a conversion from 9.0a to 9.0 (in a hook?) for this version.
  • cuDNN-9.10.1.4 contains device code for 7.0, 8.0, 9.0a, 10.0, 12.0, but not for 10.0f and 12.0f, so needs stripping of the suffix for those as well.

This PR should replace #1278 , #1286 and #1287

@casparvl casparvl added the 2025.06-software.eessi.io 2025.06 version of software.eessi.io label Dec 24, 2025
@casparvl casparvl changed the title {2025.05}[SYSTEM] CUDA 12.6.0,12.8.0, cuDNN 9.5.0.50,9.10.1.4 {2025.06}[SYSTEM] CUDA 12.6.0,12.8.0, cuDNN 9.5.0.50,9.10.1.4 Jan 6, 2026
@casparvl
Copy link
Collaborator Author

Let's first try this for a single CPU arch for each supported CC.

Native builds:

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/intel/cascadelake,accel=nvidia/cc70

Cross compiles:

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=nvidia/cc100
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=nvidia/cc120

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 12, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.01/pr_1351/18254928

date job status comment
Jan 12 13:05:56 UTC 2026 submitted job id 18254928 will be eligible to start in about 20 seconds
Jan 12 13:06:10 UTC 2026 received job awaits launch by Slurm scheduler
Jan 12 13:06:27 UTC 2026 running job 18254928 is running
Jan 12 13:21:30 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-18254928.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-icelake-accel-nvidia-cc80-17682239280.tar.zstsize: 5717 MiB (5995616078 bytes)
entries: 12678
modules under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
cuDNN/9.10.1.4-CUDA-12.8.0.lua
cuDNN/9.5.0.50-CUDA-12.6.0.lua
software under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software
CUDA/12.6.0
CUDA/12.8.0
cuDNN/9.10.1.4-CUDA-12.8.0
cuDNN/9.5.0.50-CUDA-12.6.0
reprod directories under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/reprod
CUDA/12.6.0/20260112_131018UTC
CUDA/12.8.0/20260112_131403UTC
cuDNN/9.10.1.4-CUDA-12.8.0/20260112_131835UTC
cuDNN/9.5.0.50-CUDA-12.6.0/20260112_131606UTC
other under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80
no other files in tarball
Jan 12 13:21:30 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 0/0 test case(s) from 0 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-18254928.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jan 12, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4 and accelerator nvidia/cc100
Job dir: /project/def-users/SHARED/jobs/2026.01/pr_1351/121031

date job status comment
Jan 12 13:06:02 UTC 2026 submitted job id 121031 awaits release by job manager
Jan 12 13:06:07 UTC 2026 released job awaits launch by Slurm scheduler
Jan 12 13:12:13 UTC 2026 running job 121031 is running
Jan 12 13:25:47 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-121031.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc100-17682240190.tar.zstsize: 4673 MiB (4900258367 bytes)
entries: 12508
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100/software
CUDA/12.6.0
CUDA/12.8.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100/reprod
CUDA/12.6.0/20260112_131403UTC
CUDA/12.8.0/20260112_131725UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100
no other files in tarball
Jan 12 13:25:47 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen4+default
P: latency: 1.47 us (r:0, l:None, u:None)
[ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen4+default
P: latency: 3.22 us (r:0, l:None, u:None)
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen4+default
P: latency: 0.14 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen4+default
P: bandwidth: 14144.38 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-121031.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 12, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.01/pr_1351/18254930

date job status comment
Jan 12 13:06:01 UTC 2026 submitted job id 18254930 will be eligible to start in about 20 seconds
Jan 12 13:06:14 UTC 2026 received job awaits launch by Slurm scheduler
Jan 12 13:06:31 UTC 2026 running job 18254930 is running
Jan 12 13:14:47 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-18254930.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc90-17682235690.tar.zstsize: 4676 MiB (4903415931 bytes)
entries: 12508
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.6.0
CUDA/12.8.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
CUDA/12.6.0/20260112_130859UTC
CUDA/12.8.0/20260112_131113UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
no other files in tarball
Jan 12 13:14:47 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 0/0 test case(s) from 0 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-18254930.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jan 12, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4 and accelerator nvidia/cc120
Job dir: /project/def-users/SHARED/jobs/2026.01/pr_1351/121032

date job status comment
Jan 12 13:06:07 UTC 2026 submitted job id 121032 awaits release by job manager
Jan 12 13:07:10 UTC 2026 released job awaits launch by Slurm scheduler
Jan 12 13:12:15 UTC 2026 running job 121032 is running
Jan 12 13:25:50 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-121032.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc120-17682240110.tar.zstsize: 4673 MiB (4900258259 bytes)
entries: 12508
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120/software
CUDA/12.6.0
CUDA/12.8.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120/reprod
CUDA/12.6.0/20260112_131402UTC
CUDA/12.8.0/20260112_131721UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120
no other files in tarball
Jan 12 13:25:50 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen4+default
P: latency: 1.5 us (r:0, l:None, u:None)
[ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen4+default
P: latency: 3.27 us (r:0, l:None, u:None)
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen4+default
P: latency: 0.15 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen4+default
P: bandwidth: 14082.43 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-121032.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr
Copy link
Collaborator

laraPPr commented Jan 12, 2026

bot: help

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-jsc (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-rug
Copy link

eessi-bot-rug bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-rug (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@laraPPr
Copy link
Collaborator

laraPPr commented Jan 12, 2026

bot: help

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-jsc (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-rug
Copy link

eessi-bot-rug bot commented Jan 12, 2026

Updates by the bot instance eessi-bot-rug (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@casparvl
Copy link
Collaborator Author

So... the updated hooks are not picked up, because they were used from the repository (/cvmfs/...) instead of from the cloned software-layer-scripts. EESSI/software-layer-scripts@e69b665 changed that, so that it should now use the eb_hooks.py from the software-layer-scripts.

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 12, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.01/pr_1351/18258814

date job status comment
Jan 12 15:35:25 UTC 2026 submitted job id 18258814 will be eligible to start in about 20 seconds
Jan 12 15:35:36 UTC 2026 received job awaits launch by Slurm scheduler
Jan 12 15:35:50 UTC 2026 running job 18258814 is running
Jan 12 15:44:08 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-18258814.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc90-17682325310.tar.zstsize: 4676 MiB (4903782660 bytes)
entries: 12508
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.6.0
CUDA/12.8.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
CUDA/12.6.0/20260112_153821UTC
CUDA/12.8.0/20260112_154035UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
no other files in tarball
Jan 12 15:44:08 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 0/0 test case(s) from 0 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-18258814.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Collaborator Author

Didn't work, for some reason. Adding more debugging output:

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 12, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.01/pr_1351/18259566

date job status comment
Jan 12 15:55:49 UTC 2026 submitted job id 18259566 will be eligible to start in about 20 seconds
Jan 12 15:55:55 UTC 2026 received job awaits launch by Slurm scheduler
Jan 12 15:56:18 UTC 2026 running job 18259566 is running
Jan 12 16:04:29 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-18259566.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc90-17682337500.tar.zstsize: 4676 MiB (4903380847 bytes)
entries: 12508
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.6.0
CUDA/12.8.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
CUDA/12.6.0/20260112_155846UTC
CUDA/12.8.0/20260112_160053UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
no other files in tarball
Jan 12 16:04:29 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 0/0 test case(s) from 0 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-18259566.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Collaborator Author

casparvl commented Jan 12, 2026

Forgot to push the change to add debugging output. Retry:

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 12, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.01/pr_1351/18260209

date job status comment
Jan 12 16:12:41 UTC 2026 submitted job id 18260209 will be eligible to start in about 20 seconds
Jan 12 16:12:55 UTC 2026 received job awaits launch by Slurm scheduler
Jan 12 16:13:09 UTC 2026 running job 18260209 is running
Jan 12 16:17:17 UTC 2026 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job18260209.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Jan 12 16:17:17 UTC 2026 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job18260209.test does not exist in job directory, or parsing it failed.

@casparvl
Copy link
Collaborator Author

Ok, reloading the EasyBuild module was overwriting the change. Retrying, by setting EASYBUILD_HOOKS later.

@casparvl
Copy link
Collaborator Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 12, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.01/pr_1351/18260388

date job status comment
Jan 12 16:18:09 UTC 2026 submitted job id 18260388 will be eligible to start in about 20 seconds
Jan 12 16:18:22 UTC 2026 received job awaits launch by Slurm scheduler
Jan 12 16:18:35 UTC 2026 running job 18260388 is running
Jan 12 16:28:46 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-18260388.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc90-17682352000.tar.zstsize: 5717 MiB (5995055028 bytes)
entries: 12678
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
cuDNN/9.10.1.4-CUDA-12.8.0.lua
cuDNN/9.5.0.50-CUDA-12.6.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.6.0
CUDA/12.8.0
cuDNN/9.10.1.4-CUDA-12.8.0
cuDNN/9.5.0.50-CUDA-12.6.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
CUDA/12.6.0/20260112_162102UTC
CUDA/12.8.0/20260112_162313UTC
cuDNN/9.10.1.4-CUDA-12.8.0/20260112_162629UTC
cuDNN/9.5.0.50-CUDA-12.6.0/20260112_162441UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
no other files in tarball
Jan 12 16:28:46 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 0/0 test case(s) from 0 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-18260388.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Collaborator Author

Lets try the others again too, to make sure that (still) works:

Native builds:

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/intel/cascadelake,accel=nvidia/cc70

Cross compiles:

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=nvidia/cc100
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=nvidia/cc120

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 12, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.01/pr_1351/18260777

date job status comment
Jan 12 16:32:45 UTC 2026 submitted job id 18260777 will be eligible to start in about 20 seconds
Jan 12 16:32:53 UTC 2026 received job awaits launch by Slurm scheduler
Jan 12 16:33:11 UTC 2026 running job 18260777 is running
Jan 12 16:48:00 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-18260777.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-icelake-accel-nvidia-cc80-17682363310.tar.zstsize: 5717 MiB (5995477898 bytes)
entries: 12678
modules under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
cuDNN/9.10.1.4-CUDA-12.8.0.lua
cuDNN/9.5.0.50-CUDA-12.6.0.lua
software under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software
CUDA/12.6.0
CUDA/12.8.0
cuDNN/9.10.1.4-CUDA-12.8.0
cuDNN/9.5.0.50-CUDA-12.6.0
reprod directories under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/reprod
CUDA/12.6.0/20260112_163700UTC
CUDA/12.8.0/20260112_164046UTC
cuDNN/9.10.1.4-CUDA-12.8.0/20260112_164518UTC
cuDNN/9.5.0.50-CUDA-12.6.0/20260112_164249UTC
other under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80
no other files in tarball
Jan 12 16:48:00 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 0/0 test case(s) from 0 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-18260777.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jan 12, 2026

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: intel-cascadelake and accelerator nvidia/cc70
Building for: x86_64/intel/cascadelake and accelerator nvidia/cc70
Job dir: /scratch/gent/vo/002/gvo00211/SHARED/jobs/2026.01/pr_1351/40772315

date job status comment
Jan 12 16:32:47 UTC 2026 submitted job id 40772315 awaits release by job manager
Jan 12 16:34:05 UTC 2026 released job awaits launch by Slurm scheduler
Jan 12 16:36:09 UTC 2026 running job 40772315 is running
Jan 12 16:50:28 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-40772315.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-cascadelake-accel-nvidia-cc70-17682364930.tar.zstsize: 5711 MiB (5989155876 bytes)
entries: 12678
modules under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
cuDNN/9.10.1.4-CUDA-12.8.0.lua
cuDNN/9.5.0.50-CUDA-12.6.0.lua
software under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/software
CUDA/12.6.0
CUDA/12.8.0
cuDNN/9.10.1.4-CUDA-12.8.0
cuDNN/9.5.0.50-CUDA-12.6.0
reprod directories under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/reprod
CUDA/12.6.0/20260112_163725UTC
CUDA/12.8.0/20260112_164027UTC
cuDNN/9.10.1.4-CUDA-12.8.0/20260112_164758UTC
cuDNN/9.5.0.50-CUDA-12.6.0/20260112_164209UTC
other under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70
no other files in tarball
Jan 12 16:50:28 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-40772315.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jan 12, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4 and accelerator nvidia/cc100
Job dir: /project/def-users/SHARED/jobs/2026.01/pr_1351/121033

date job status comment
Jan 12 16:32:50 UTC 2026 submitted job id 121033 awaits release by job manager
Jan 12 16:33:12 UTC 2026 released job awaits launch by Slurm scheduler
Jan 12 16:38:18 UTC 2026 running job 121033 is running
Jan 12 16:48:52 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-121033.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc100-17682362730.tar.zstsize: 3389 MiB (3554016637 bytes)
entries: 6476
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
cuDNN/9.10.1.4-CUDA-12.8.0.lua
cuDNN/9.5.0.50-CUDA-12.6.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100/software
CUDA/12.6.0
CUDA/12.8.0
cuDNN/9.10.1.4-CUDA-12.8.0
cuDNN/9.5.0.50-CUDA-12.6.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100/reprod
CUDA/12.6.0/20260112_163835UTC
CUDA/12.8.0/20260112_164148UTC
cuDNN/9.10.1.4-CUDA-12.8.0/20260112_164340UTC
cuDNN/9.5.0.50-CUDA-12.6.0/20260112_164153UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100
no other files in tarball
Jan 12 16:48:52 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen4+default
P: latency: 1.4 us (r:0, l:None, u:None)
[ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen4+default
P: latency: 3.3 us (r:0, l:None, u:None)
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen4+default
P: latency: 0.14 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen4+default
P: bandwidth: 14243.88 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-121033.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 12, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.01/pr_1351/18260778

date job status comment
Jan 12 16:32:51 UTC 2026 submitted job id 18260778 will be eligible to start in about 20 seconds
Jan 12 16:32:58 UTC 2026 received job awaits launch by Slurm scheduler
Jan 12 16:33:27 UTC 2026 running job 18260778 is running
Jan 12 16:43:24 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-18260778.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc90-17682360750.tar.zstsize: 5717 MiB (5994873728 bytes)
entries: 12678
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
cuDNN/9.10.1.4-CUDA-12.8.0.lua
cuDNN/9.5.0.50-CUDA-12.6.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
CUDA/12.6.0
CUDA/12.8.0
cuDNN/9.10.1.4-CUDA-12.8.0
cuDNN/9.5.0.50-CUDA-12.6.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
CUDA/12.6.0/20260112_163544UTC
CUDA/12.8.0/20260112_163752UTC
cuDNN/9.10.1.4-CUDA-12.8.0/20260112_164104UTC
cuDNN/9.5.0.50-CUDA-12.6.0/20260112_163919UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
no other files in tarball
Jan 12 16:43:24 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 0/0 test case(s) from 0 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-18260778.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jan 12, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4 and accelerator nvidia/cc120
Job dir: /project/def-users/SHARED/jobs/2026.01/pr_1351/121034

date job status comment
Jan 12 16:32:55 UTC 2026 submitted job id 121034 awaits release by job manager
Jan 12 16:33:09 UTC 2026 released job awaits launch by Slurm scheduler
Jan 12 16:38:16 UTC 2026 running job 121034 is running
Jan 12 16:48:50 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-121034.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc120-17682362730.tar.zstsize: 3389 MiB (3554187254 bytes)
entries: 6476
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
cuDNN/9.10.1.4-CUDA-12.8.0.lua
cuDNN/9.5.0.50-CUDA-12.6.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120/software
CUDA/12.6.0
CUDA/12.8.0
cuDNN/9.10.1.4-CUDA-12.8.0
cuDNN/9.5.0.50-CUDA-12.6.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120/reprod
CUDA/12.6.0/20260112_163835UTC
CUDA/12.8.0/20260112_164153UTC
cuDNN/9.10.1.4-CUDA-12.8.0/20260112_164344UTC
cuDNN/9.5.0.50-CUDA-12.6.0/20260112_164200UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120
no other files in tarball
Jan 12 16:48:50 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen4+default
P: latency: 1.41 us (r:0, l:None, u:None)
[ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen4+default
P: latency: 3.28 us (r:0, l:None, u:None)
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen4+default
P: latency: 0.15 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen4+default
P: bandwidth: 14111.78 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-121034.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Collaborator Author

The last failure was because of:

DEBUG: after loading EESSI-extend //  EASYBUILD_INSTALLPATH='/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100'
Going to install full CUDA SDK and cu* libraries under host_injections if necessary
Created temporary directory '/tmp/tmp.P9bQJMxmDu/temp_install_storage/cuda_n_co.wtA'
Processing easystack file ...

ESC[32m>> Found an EasyBuild/5.1.2 moduleESC[0m

The following have been reloaded with a version change:
  1) EasyBuild/5.2.0 => EasyBuild/5.1.2

ESC[32m>> Module for EESSI-extend/2025.06-easybuild found!ESC[0m
-- Using /tmp/$USER as a temporary working directory for installations, you can
override this by setting the environment variable WORKING_DIR and reloading the
module (e.g., /dev/shm is a common option)
-- To create installations for EESSI, you _must_ have write permissions to
/cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100
-- You may wish to configure a sources directory for EasyBuild (for example,
via setting the environment variable EASYBUILD_SOURCEPATH) to allow you to
reuse existing sources for packages.
set MODULEPATH=/cvmfs/software.eessi.io/host_injections/x86_64/.modules/all
Show EasyBuild configuration
ERROR: Failed to parse configuration options: "Found problems validating the options: Incorrect values in --cuda-compute-capabilities (expected pattern: '^[0-9]+\\.[0-9]+a?$'): 10.0f"
>> Determining if packages specified in /cvmfs/software.eessi.io/versions/2025.06/scripts/gpu_support/nvidia/easystacks/eessi-2025.06-eb-5.1.2-CUDA-host-injections.yml are missing under /cvmfs/software.eessi.io/host_injections/2025.06/software/linux/x86_64/amd/zen4
ERROR: Failed to parse configuration options: "Found problems validating the options: Incorrect values in --cuda-compute-capabilities (expected pattern: '^[0-9]+\\.[0-9]+a?$'): 10.0f"
number of packages to be (re-)installed: '0'
Running eb --prefix=/tmp/tmp.P9bQJMxmDu/temp_install_storage/cuda_n_co.wtA --installpath-modules=/cvmfs/software.eessi.io/host_injections/x86_64/.modules --hooks=/tmp/tmp.P9bQJMxmDu/temp_install_storage/cuda_n_co.wtA/none.py --easystack /cvmfs/software.eessi.io/versions/2025.06/scripts
/gpu_support/nvidia/easystacks/eessi-2025.06-eb-5.1.2-CUDA-host-injections.yml --accept-eula-for=CUDA,cuDNN
ERROR: Failed to parse configuration options: "Found problems validating the options: Incorrect values in --cuda-compute-capabilities (expected pattern: '^[0-9]+\\.[0-9]+a?$'): 10.0f"
ERROR: Failed to parse configuration options: "Found problems validating the options: Incorrect values in --cuda-compute-capabilities (expected pattern: '^[0-9]+\\.[0-9]+a?$'): 10.0f"
cp: missing destination file operand after '.'
Try 'cp --help' for more information.
basename: missing operand
Try 'basename --help' for more information.
ESC[31mERROR: some installation failed, please check EasyBuild logs /eessi_bot_job/...ESC[0m
No 'nvidia-smi' found, no available GPU.

I.e. this is where we're trying to install the CUDA SDKs in host-injections. The issue is that EB 5.1.2 is used for that, which doesn't support the f suffix in --cuda-compute-capabilities=10.0f. The easiest solution is probably to update the EB version used to install the CUDA SDKs.

@casparvl
Copy link
Collaborator Author

Trying again with EESSI/software-layer-scripts@8292fa3 this fix

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=nvidia/cc100
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=nvidia/cc120

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jan 12, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4 and accelerator nvidia/cc100
Job dir: /project/def-users/SHARED/jobs/2026.01/pr_1351/121035

date job status comment
Jan 12 17:03:08 UTC 2026 submitted job id 121035 awaits release by job manager
Jan 12 17:04:00 UTC 2026 released job awaits launch by Slurm scheduler
Jan 12 17:05:06 UTC 2026 running job 121035 is running
Jan 12 17:15:41 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-121035.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc100-17682378980.tar.zstsize: 3390 MiB (3554720700 bytes)
entries: 6477
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
cuDNN/9.10.1.4-CUDA-12.8.0.lua
cuDNN/9.5.0.50-CUDA-12.6.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100/software
CUDA/12.6.0
CUDA/12.8.0
cuDNN/9.10.1.4-CUDA-12.8.0
cuDNN/9.5.0.50-CUDA-12.6.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100/reprod
CUDA/12.6.0/20260112_170439UTC
CUDA/12.8.0/20260112_170906UTC
cuDNN/9.10.1.4-CUDA-12.8.0/20260112_171056UTC
cuDNN/9.5.0.50-CUDA-12.6.0/20260112_170909UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc100
2025.06/scripts/gpu_support/nvidia/easystacks/eessi-2025.06-eb-5.2.0-CUDA-host-injections.yml
Jan 12 17:15:41 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen4+default
P: latency: 1.45 us (r:0, l:None, u:None)
[ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen4+default
P: latency: 3.19 us (r:0, l:None, u:None)
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen4+default
P: latency: 0.15 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen4+default
P: bandwidth: 14387.73 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-121035.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jan 12, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4 and accelerator nvidia/cc120
Job dir: /project/def-users/SHARED/jobs/2026.01/pr_1351/121036

date job status comment
Jan 12 17:03:12 UTC 2026 submitted job id 121036 awaits release by job manager
Jan 12 17:03:58 UTC 2026 released job awaits launch by Slurm scheduler
Jan 12 17:05:03 UTC 2026 running job 121036 is running
Jan 12 17:15:39 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-121036.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc120-17682379130.tar.zstsize: 3389 MiB (3554526100 bytes)
entries: 6477
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120/modules/all
CUDA/12.6.0.lua
CUDA/12.8.0.lua
cuDNN/9.10.1.4-CUDA-12.8.0.lua
cuDNN/9.5.0.50-CUDA-12.6.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120/software
CUDA/12.6.0
CUDA/12.8.0
cuDNN/9.10.1.4-CUDA-12.8.0
cuDNN/9.5.0.50-CUDA-12.6.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120/reprod
CUDA/12.6.0/20260112_170437UTC
CUDA/12.8.0/20260112_170910UTC
cuDNN/9.10.1.4-CUDA-12.8.0/20260112_171101UTC
cuDNN/9.5.0.50-CUDA-12.6.0/20260112_170914UTC
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc120
2025.06/scripts/gpu_support/nvidia/easystacks/eessi-2025.06-eb-5.2.0-CUDA-host-injections.yml
Jan 12 17:15:39 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen4+default
P: latency: 1.4 us (r:0, l:None, u:None)
[ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen4+default
P: latency: 3.22 us (r:0, l:None, u:None)
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen4+default
P: latency: 0.15 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen4+default
P: bandwidth: 14559.34 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-121036.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025.06-software.eessi.io 2025.06 version of software.eessi.io

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants